test(envelope-contract): pin producer→consumer envelope contracts (closes #114) by Lykhoyda · Pull Request #178 · Lykhoyda/rn-dev-agent

Lykhoyda · 2026-05-20T12:11:51Z

Summary

Closes #114 — adds 25 contract tests that pin the envelope shapes each dispatch tier emits against the handler-side parsers that consume them. A future divergence on either side fails fast in CI rather than slipping through to prod.

Background

Handler integration tests stub runAgentDevice via _setRunAgentDeviceForTest with synthetic envelopes. Codex flagged on PR #109 (conf 80) that this short-circuits the real wrapper's dispatch tiers, each of which produces subtly different shapes. The original issue listed agent-device's internal tiers (fast-runner HTTP / daemon socket / CLI subprocess); post-PR #164 (iOS-MVP) and PR #165 (Android-MVP) the surface widened to include the in-tree iOS and Android runner clients too. This PR pins the current surface.

Producer fixtures pinned

Producer	Shape	Notes
In-tree iOS runner	`{ref: '@e<n>', type, rect, label?, identifier?, enabled?, hittable?}`	Flat-nodes; post-`mapRunnerNodesToFlat` normalization
In-tree Android runner	Identical to iOS flat-nodes shape	Parity test catches divergence
Legacy daemon (socket)	Flat-nodes, less metadata
Legacy CLI (subprocess)	Flat-nodes (currently same as daemon)	Pinned separately so a future split would surface here
Legacy agent-device fast-runner	Nested tree (not flat)	`findRefByTestID`'s second branch — removing it would fail this test
iOS typeText runner-timeout shim	`{ok:true, data: {typed, text}, meta: {sideEffectSucceeded, runnerTimeoutShim}}`	Must NOT classify as snapshot-failed

Consumers exercised

findRefByTestID — both flat-nodes and nested-tree branches, plus empty-nodes/no-match returns
snapshotEnvelopeFailed — the critical Phase 128 rn-tester agent: excessive retries, Python workarounds, and inability to test permission flows #5/ENAMETOOLONG error when installing plugin via marketplace #6 distinction between empty-nodes success (TESTID_NOT_FOUND downstream) vs snapshot-infrastructure failure (SNAPSHOT_FAILED)
Edge cases: null/undefined/empty string/malformed JSON all classified as failed

What codex-pair caught during review

Three MED fidelity issues — a contract test with wrong fixtures is worse than no contract test because it gives false confidence. All fixed pre-commit:

My initial in-tree fixtures used ref: 'app-0' style with parentIndex/depth, but mapRunnerNodesToFlat emits @e<n> refs with type/rect/enabled/hittable — verified by reading both rn-fast-runner-client.ts:488 and rn-android-runner-client.ts:230.
The in-tree failure fixture used the raw HTTP error shape {error: {message, code}} — but MCP consumers see the post-failResult shape {ok:false, error: string, code: string} (per failResult(message, code) at rn-fast-runner-client.ts:564).
Comment claimed daemon + CLI were pinned separately but only daemon was. Added a separate CLI fixture so the claim is honest.

Test plan

1506/1506 cdp-bridge unit tests passing (+25 net new)
All 5 producer fixtures × findRefByTestID consumer → resolves expected ref by identifier
All 3 failure-envelope fixtures × findRefByTestID → returns null (refuses to scan failed snapshot)
All 5 producer fixtures × snapshotEnvelopeFailed → returns false (success)
All 3 failure fixtures × snapshotEnvelopeFailed → returns true
Empty-nodes success and runner-timeout shim both correctly classified as NOT-failed
codex-pair clean on final commit
CI green

Refs

Test coverage gap: agent-device dispatch tiers (fast-runner / daemon / CLI) under realistic envelope shapes #114 — original coverage-gap issue
PR fix(action-store): atomic YAML+sidecar pair-write + handler integration tests — v0.44.15 #109 — multi-LLM review (Codex finding 6) that originally surfaced the gap
scripts/cdp-bridge/src/runners/rn-fast-runner-client.ts:488 (mapRunnerNodesToFlat)
scripts/cdp-bridge/src/runners/rn-android-runner-client.ts:230 (Android mapRunnerNodesToFlat)
scripts/cdp-bridge/src/tools/device-batch.ts:61 (findRefByTestID)
scripts/cdp-bridge/src/tools/device-batch.ts:111 (snapshotEnvelopeFailed)

🤖 Generated with Claude Code

@e

…oses #114) Adds 25 contract tests that pin the envelope shapes each dispatch tier emits against the handler-side parsers that consume them, so a future divergence on either side fails fast in CI rather than slipping through to production. Background: handler integration tests stub `runAgentDevice` via `_setRunAgentDeviceForTest` with synthetic envelopes. Codex flagged on PR #109 (conf 80) that this short-circuits the real wrapper's three dispatch tiers, each of which produces subtly different shapes. The original issue listed agent-device's internal tiers (fast-runner HTTP / daemon socket / CLI), but post-PR #164 (iOS-MVP) and PR #165 (Android- MVP) the surface widened to include the in-tree iOS and Android runner clients too. This PR covers the current surface. Producers pinned: 1. In-tree iOS runner (rn-fast-runner-client.runIOS) — flat nodes with `{ref: '@e<n>', type, rect, label?, identifier?, enabled?, hittable?}` shape after mapRunnerNodesToFlat normalization 2. In-tree Android runner (rn-android-runner-client.runAndroid) — identical flat-node shape (the parity test pins this — a divergence here would silently break platform-agnostic handlers) 3. Legacy upstream agent-device daemon socket — flat-nodes with less metadata 4. Legacy upstream agent-device CLI subprocess — separate fixture even though current shape equals daemon, so a future divergence would surface here 5. Legacy upstream agent-device internal fast-runner sub-tier — nested-tree shape, NOT flat. findRefByTestID's `env.data.tree` branch handles this; removing the branch without warning would fail this test 6. iOS XCUIElement.typeText runner-timeout shim — `{ok:true, data: {typed, text}, meta: {sideEffectSucceeded, runnerTimeoutShim}}`. snapshotEnvelopeFailed must NOT report this as a failure (it would route every successful iOS fill to SNAPSHOT_FAILED otherwise) Consumers exercised: - findRefByTestID (device-batch.ts) — both flat-nodes and nested-tree branches - snapshotEnvelopeFailed (device-batch.ts) — including the critical distinction between empty-nodes success (TESTID_NOT_FOUND) and snapshot-infrastructure failure (SNAPSHOT_FAILED), per Phase 128 #5/#6 - Edge cases: null/undefined/empty/malformed JSON all classified as failed codex-pair caught three fidelity issues during review (MED): my initial fixtures used `ref: 'app-0'` style refs with `parentIndex`/`depth` fields, but the actual mapRunnerNodesToFlat output emits `@e<n>` refs with `type`/`rect`/`enabled`/`hittable`. The failure fixture used the raw HTTP error shape `{error: {message, code}}` instead of the post- failResult `{ok:false, error: string, code: string}` shape MCP consumers actually see. And the comment claimed daemon + CLI were pinned separately but only daemon was. All three fixed before commit — a contract test with the wrong fixtures is worse than no contract test because it gives false confidence. Verified: 1506/1506 cdp-bridge unit tests passing (+25 net new). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Lykhoyda merged commit 9e0a586 into main May 20, 2026
7 checks passed

Lykhoyda deleted the test/gh-114-envelope-contract branch May 20, 2026 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(envelope-contract): pin producer→consumer envelope contracts (closes #114)#178

test(envelope-contract): pin producer→consumer envelope contracts (closes #114)#178
Lykhoyda merged 1 commit into
mainfrom
test/gh-114-envelope-contract

Lykhoyda commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Lykhoyda commented May 20, 2026

Summary

Background

Producer fixtures pinned

Consumers exercised

What codex-pair caught during review

Test plan

Refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant